International Journal of Trend in Scientific Research and Development, Volume 1(4), ISSN: 2456-6470 

www.ijtsrd.com 

Data Mining For Supermarket Sale Analysis Using Association Rule 


Mrs. R. R. Shelke 

H.V.P.M. COET, Amravati 


Dr. R. V. Dharaskar 

Former Director 

DES (Disha - DIMAT) Group of 
Institutes, Raipur 


Dr. V. M. Thakare 

Prof, and Head, 
Computer Science Dept., SGB 
Amravati University, Amravati 


ABSTRACT 

Data mining is the novel technology of discovering 
the important information from the data repository 
which is widely used in almost all fields Recently, 
mining of databases is very essential because of 
growing amount of data due to its wide applicability 
in retail industries in improving marketing strategies. 
Analysis of past transaction data can provide very 
valuable information on customer behavior and 
business decisions. The amount of data stored grows 
twice as fast as the speed of the fastest processor 
available to analyze it. Its main purpose is to find the 
association relationship among the large number of 
database items. It is used to describe the patterns of 
customers' purchase in the supermarket. This is 
presented in this paper. 
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I. INTRODUCTION 

Data mining tasks can be classified into two 
categories: Descriptive mining and Predictive mining. 
Descriptive mining refers to the method in which the 
essential characteristics of the data in the database are 
described. Clustering, Association and Sequential 
mining are the main tasks involved in the descriptive 
mining techniques tasks. Predictive mining deduces 
patterns from the data in a similar manner as 
predictions. Predictive mining techniques include 
tasks like Classification, Regression and Deviation 
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detection. Mining Frequent Itemsets from transaction 
databases is a fundamental task for several fonns of 
knowledge discovery such as association rules, 
sequential patterns, and classification. An itemset is 
frequent if the subsets in a collection of sets of items 
occur frequently. Frequent itemsets is generally 
adopted to generate association rules. The objective of 
Frequent Item set Mining is the identification of items 
that co-occur above a user given value of frequency, 
in the transaction database. Association rule mining is 
one of the principal problems treated in KDD and can 
be defined as extracting the interesting correlation and 
relation among huge amount of transactions. 

II. LITERATURE REVIEW 

Association Rule Discovery has become a core topic 
in Data Mining. It attracts more attention because of 
its wide applicability. Association rule mining is 
normally performed in generation of frequent itemsets 
and rule generation in which many researchers 
presented several efficient algorithms [1-5]. T. 
Karthikeyan and N. Ravikumar, aim at giving a 
theoretical survey on some of the existing algorithms 
[3]. The concepts behind association rules are 
provided at the beginning followed by an overview to 
some of the previous research works done on this 
area. The advantages and limitations are discussed 
and concluded with an inference. Association rule 
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mining discovers the frequent patterns among the 
itemsets. It aims to extract interesting associations, 
frequent patterns, and correlations among sets of 
items in the data repositories. For Example, In a 
Laptop store in India, 80% of the customers who are 
buying Laptop computers also buy Data card for 
internet and pen drive for data portability. 

Kinds of Association Rule Mining : 

According to Udaiveer Singh Parmar, Based on the 
number of data dimensions involved, we can 
distinguish association rules on basis of dimensions 
[ 6 ]: 

A. Single-dimensional association rule: An 
association rule is a single-dimensional, if the 
items or attributes in an association rule 
reference only one dimension. 

B. Multidimensional association rule: 

If a rule references more than one dimension, 
such as the dimensions like study-level, income, and 
buys, then it is a multidimensional association rule. 
Let X an item set, the following rule is an example of 
a multidimensional rule: Study- 

Level(X,“20.. ,25”)-nncome(X, “30K.... 40K”))^ 
buys(X,“performant computer”): 

Based on the types of values handled by the rule, we 
can distinguish two types of association rules: 

Boolean association rule: A rule is a Boolean 
association rule, if it involves associations between 
the presence or the absence of items. For example, the 
following rule is a Boolean association rules obtained 
from market basket. 

analysis: buys(X, “computer”))->buys(X, “scanner”). 

IJTSRD | May-Jun 2017 
Available Online @www.ijtsrd.com 


Quantitative association rule: a rule is called 
quantitative association rule, if it describes 
associations between quantitative items or attributes. 
In these rules, quantitative values for items or 
attributes are partitioned into intervals. 

For the past decades, there are several efforts has been 
made to discover the scalable and efficient methods 
for mining frequent ARs. However, mining least ARs 
is still left behind. As a result, ARs that are rarely 
found in the database are pruned out by the minimum 
support-confidence threshold. As a matter of fact, the 
rarely ARs can also reveal the useful information for 
detecting the highly critical and exceptional 
situations. One suggested a method to mine the ARs 
by considering only infrequent itemset. The drawback 
is, Matrix-based Scheme (MBS) and Hash-based 
scheme (HBS) algorithms are very expensive in term 
of hash collision. Ding proposed Transactional Co¬ 
occurrence Matrix (TCOM) for mining association 
rule among rare items. However, the implementation 
wise is quite complex and costly. Yun, et al., [7] 
introduced the Relative Support Apriori Algorithm 
(RSAA) to generate rare itemsets. The challenge is, it 
takes similar time taken as performed by Apriori if the 
allowable minimum support is set to very low. Koh, et 
al., [8] suggested Apriori-Inverse algorithm to mine 
infrequent itemsets without generating any frequent 
rules. However, it suffers from candidate itemset 
generations and costly in generating the rare ARs. 
Multiple Support Apriori (MSApriori) algorithm has 
been used to extract the rare ARs. In actual 
implementation, this algorithm is facing the “rare item 
problem”. Many of them are using the percentage- 
based approach to improve the performance as faces 
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by the single minimum support based approaches. An 
objective measure called lift and chi-square as 
correlation measure for ARs has been introduced. Lift 
compares the frequency of pattern against a baseline 
frequency computed under statistical independence 
assumption. Omiecinski proposed two interesting 
measures based on downward closure property called 
all confidence and bond [9]. There are two algorithms 
for mining all confidence and bond correlation 
patterns by extending the pattern-growth 
methodology. In term of mining algorithms, Agrawal, 
et ah, [10] proposed the first ARs mining algorithm 
called Apriori. The main bottleneck of Apriori is, it 
requires multiple scanning of transaction database and 
also generates huge number of candidate itemsets. 
AlhasanBala et al. [ 11 ] suggested FP-Growth 
algorithm which amazingly can break the two 
limitations as faced by Apriori series algorithms. 
Currently, FP-Growth is one of the fastest approach 
and most benchmarked algorithms for frequent 
itemsets mining. 

III. PROPOSED SYSTEM FOR 

ASSOCIATION RULE MINING 

One of the most popular data mining methods is to 
discover frequent itemsets from a transaction dataset 
and develop association rules. Finding frequent 
itemsets (itemsets with frequency larger than or equal 
to a user defined minimum support) is significant 
because of its combinatorial explosion. As soon as 
frequent itemsets are found, it is straightforward to 
produce association rules with confidence greater than 
or equal to a user stated minimum confidence. Apriori 
is an important algorithm for finding frequent itemsets 
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using candidate generation. It is considered as a level- 
wise whole search algorithm using anti-monotonicity 
of itemsets i.e. “If an itemset is not frequent, any of its 
superset is never frequent”. 

Algorithm used for association for proposed system is 
as follows: 

1. Get unique product ID and product name from 
product and sales tables where the product ID of 
product and sales table are matched 

2. Convert all product names and product ID into two 
arrays 

3. Retrieve product class ID for each product from the 
product table 

4. Retrieve unique product category for each product 
class ID from product category table. 

5. Create array of product Categories 

6. Provide the product category array to apriori 
algorithm 

7. Store results of Apriori in database table finaloutput 

IV. RESULTS OF ASSOCIATIONS 

Market basket analysis is one of the most 

common and beneficial technique of data analysis for 

marketing and retailing. The main purpose of Market 

basket Analysis is to decide what products are usually 

bought together by the customer. Market basket 

analysis identifies purchasing habits of customers. It 

offers awareness into the combination of products 

within a customer’s 'basket'. The term 'basket' 

normally applies to a single order. However, the 

analysis can be useful to other variations. In Market 

Basket Analysis one can analyze combination of 

products to be sold together and this will be helpful 
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for both retailer as well as manufacturing company. A 
store could use this analyzed information to place 
products frequently sold together into the same area, 
so that store product selling gets increased. To find 
association rules Apriori algorithm has been used. 
Association rule from the frequent itemset has been 
generated. A Sales table of supermarket dataset has 
been used. A set of association rules has been 
obtained by applying Apriori algorithm. 

Result generated after applying association mining is 
as shown in fig. 1 
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The rapid growth and advances of information 
technology enable data to be accumulated faster and 
in much larger quantities. Faced with vast new 
information resources, scientists, engineers, and 
business people need efficient analytical techniques to 
extract useful information and effectively uncover 
new, valuable knowledge patterns. In this paper, 
association rule mining for supermarket dataset has 
been presented. Mining has been applied to sales data 
of dataset. In proposed system, the apriori algorithm 
has been used on super market dataset which gives 
associations of two products which has maximum 
support. 
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Data Mining: Associations Rules 


Category: Fmit 


Specialty 
Seafood 
Seafood 
Snack Foods 
Snack Foods 
Snack Foods 
Snack Foods 
Snack Foods 


Items: 1-Ebony Cantelope 2-Ebony Honey Dew 3-Ebony Fuji Apples 4- 
5-Ebony Tangerines 

Item Ebony Cantelope 8 Ebony Honey Dew are selected 
Item Ebony Cantelope & Ebony Fuji Apples are selected 
Item Ebony Cantelope & Ebony Oranges are selected 
Item Ebony Cantelope 8 Ebony Tangerines are selected 
Item Ebony Honey Dew A Ebony Fuji Apples are selected 
Item Ebony Honey Dev; 8 Ebony Oranges are selected 
Item Ebony Honey Dev; A Ebony Tangerines are selected 
Item Ebony Fuji Apples 8 Ebony Oranges are selected 
Item Ebony Fuji Apples 8 Ebony Tangerines are selected 


Fig 1. Result of Associations 


V. CONCLUSION 
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